Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 0: Load The Data

In [22]:
# Load pickled data
import pickle

# TODO: Fill this in based on where you saved the training and testing data

training_file = 'traffic-signs-data/train.p'
testing_file = 'traffic-signs-data/test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train, y_train = train['features'], train['labels']
X_test, y_test = test['features'], test['labels']

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 1D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Complete the basic data summary below.

In [23]:
### Replace each question mark with the appropriate value.
import numpy as np

print(X_train.shape)
# TODO: Number of training examples
n_train = X_train.shape[0]

# TODO: Number of testing examples.
n_test = X_test.shape[0]

# TODO: What's the shape of an traffic sign image?
image_shape = X_test.shape[1:3]

# TODO: How many unique classes/labels there are in the dataset.
n_classes = np.unique(y_train).shape[0]

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
(39209, 32, 32, 3)
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32)
Number of classes = 43

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.

In [5]:
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
# Visualizations will be shown in the notebook.
%matplotlib inline
total = 200
images = [X_train[np.random.randint(0, n_train)] for _ in range(total)]

fig = plt.figure(1, (32, 32))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(10, 20),
                 axes_pad=0,
)
for i in range(total):
    grid[i].imshow(images[i], cmap=plt.get_cmap('Greys_r'))  # The AxesGrid object work as a list of axes.
    grid[i].set_axis_off()
    #grid[i].xaxis.set_major_formatter(plt.NullFormatter())
    #grid[i].yaxis.set_major_formatter(plt.NullFormatter())
    
plt.show(block=True)
In [6]:
from random import shuffle

signs = np.genfromtxt("signnames.csv", dtype=None, delimiter=",", skip_header=1)
unique_images = [None for _ in range(signs.shape[0])]
numbers = []
values = 0

array = [i for i in range(X_train.shape[0])]
shuffle(array)

for i in array:
    label = y_train[i]
    if label not in numbers:
        numbers.append(label)
        unique_images[label] = X_train[i]
        values += 1
    if values == signs.shape[0]:
        break

fig = plt.figure(1, (30, 30))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(11, 4),
                 axes_pad=0.4,
                 )

for i in range(len(unique_images)):
    #unique_images[i].shape = (32, 32)
    #print(unique_images[i].shape)
    grid[i].set_title(signs[i][1], fontdict=None, loc='center', color = "k")
    grid[i].imshow(unique_images[i], cmap=plt.get_cmap('Greys_r'))

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [24]:
### Preprocess the data here.
### Feel free to use as many code cells as needed.

### Imports and parameters
import tensorflow as tf
import cv2
import sklearn
import numpy as np

from tensorflow.contrib.layers import flatten
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
In [25]:
def grayscale(img):
    """Applies the Grayscale transform
    This will return an image with only one color channel
    but NOTE: to see the returned image as grayscale
    (assuming your grayscaled image is called 'gray')
    you should call plt.imshow(gray, cmap='gray')"""
    #print(img)
    return cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

def normalize_grayscale(image_data):
    a = -100.0
    b = 100.0
    grayscale_min = 0
    grayscale_max = 255
    return a + (((image_data - grayscale_min)*(b - a))/(grayscale_max - grayscale_min))


#f = lambda img: normalize_grayscale(grayscale(img))
f = lambda img: grayscale(img)

X_train = [f(img) for img in X_train]
X_test = [f(img) for img in X_test]

X_train = np.array(X_train)
X_train = np.reshape(X_train, (n_train, 32, 32, 1))

X_test = np.array(X_test)
X_test = np.reshape(X_test, (n_test, 32, 32, 1))

Question 1

Describe how you preprocessed the data. Why did you choose that technique?

Answer:

I applied two techniques:

  • Convert to grayscale:

I applied this technique to reduce the number of features in order to train the neural network (it's probable that the color matters in order to predict the traffic sign but I decided to apply it in order to reduce the complexity of the model)

  • Max-Min normalization (-100.0, 100.0)

I tried this technique but at the end I didn't use it because I saw no improvement on the validation score (I've tried with different scales [-0.5, 0.5], [-1.0, 1.0], [-10, 10]. The best one was [-100.0, 100.0])

In [26]:
print(X_train.shape)
### Generate additional data (OPTIONAL!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
## 'fake' data generator
def image_transform(img, label, generate=1):
    rows, cols = img.shape[0], img.shape[1]
    images = []
    for _ in range(generate):
        # scale color between [-20, 20]
        scaled = img + np.random.randint(-20, 20)
        # translate image in range [-2, 2] pixels
        M = np.float32([[1,0,np.random.randint(-2, 2)],[0,1,np.random.randint(-2, 2)]])
        translated = cv2.warpAffine(img,M,(cols,rows))
        # rotate image [-15, 15] degrees
        M = cv2.getRotationMatrix2D((cols/2,rows/2),np.random.randint(-15, 15),1)
        rotated = cv2.warpAffine(img,M,(cols,rows))
        rotated.shape = [32, 32, 1]
        images.append(rotated)
    return np.array(images), np.array([label for _ in range(generate)])

total = 10
image = X_train[np.random.randint(0, n_train)]
images = image_transform(image, 4, total)[0]

fig = plt.figure(1, (32, 32))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(1, 10),
                 axes_pad=0,
)

images.shape = (total, 32, 32)
for i in range(total):
    grid[i].imshow(images[i], cmap=plt.get_cmap('Greys_r'))  # The AxesGrid object work as a list of axes.
    grid[i].set_axis_off()
    #grid[i].xaxis.set_major_formatter(plt.NullFormatter())
    #grid[i].yaxis.set_major_formatter(plt.NullFormatter())

print("Generated images")
plt.show(block=True)
print("Original image")
image.shape = (32, 32)
plt.imshow(image, cmap='gray')
(39209, 32, 32, 1)
Generated images
Original image
Out[26]:
<matplotlib.image.AxesImage at 0x7f6715c7d128>
In [27]:
## change the variable action to generate the "fake" images, this process take like 15-20 minutes

action = 'load'
if action == 'generate':
    ## generate images
    for i in range(n_train):
        if i % 1000 == 0:
            print("image:", i)
        images, labels = image_transform(X_train[i], y_train[i], 4)
        X_train = np.append(X_train, images, axis=0)
        y_train = np.append(y_train, labels, axis=0)
    pickle.dump({"features": X_train, "labels": y_train}, open("transform.p", "wb"))
    
if action == 'load':
    train_dataset = pickle.load(open("transform.p", "rb"))
    X_train, y_train = train_dataset["features"], train_dataset["labels"]
print(X_train.shape)
(196045, 32, 32, 1)
In [28]:
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.2, random_state=42)
print(X_train.shape, X_validation.shape)
(156836, 32, 32, 1) (39209, 32, 32, 1)

Question 2

Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?

Answer:

I generate "fake" data performing three transformations over the images (transformations can be seen above):

  • Scale the color of the grayscale image (add a number between [-20, 20])
  • Image translation between a range of [-2, 2] pixels
  • Image rotation between a range of [-15, 15] degrees

The ConvNets are really good to deal with small transformations over the data, but if you add this kind of transformations to the dataset will make more robust the learning process as Yann LeCun explain in this paper (in fact, this step helps me to improve the validation accuracy [from 0.96 to 0.99 approximately] and the test accuracy[from 0.9 to 0.938 approximately])

At the end I've generate 4 "fake" images for each image on the dataset and then I've splitted this on training(80%) and validation(20%), the test set remains unchanged.

In [29]:
### Define your architecture here.
### Feel free to use as many code cells as needed.
def LeNet(x, activation=tf.tanh, pooling=tf.nn.avg_pool, keep_prob=0.5):    
    # Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
    mu = 0
    sigma = 0.1
    
    # TODO: Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
    conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), mean=mu, stddev=sigma))
    conv1_B = tf.Variable(tf.zeros(6))
    conv1 = tf.nn.conv2d(x, conv1_W, strides=(1, 1, 1, 1), padding='VALID') + conv1_B

    # TODO: Activation.
    conv1 = activation(conv1)

    # TODO: Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1 = pooling(conv1, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding='VALID')

    # TODO: Layer 2: Convolutional. Output = 10x10x16.
    conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean=mu, stddev=sigma))
    conv2_B = tf.Variable(tf.zeros(16))
    conv2 = tf.nn.conv2d(conv1, conv2_W, strides=(1, 1, 1, 1), padding='VALID') + conv2_B
    
    # TODO: Activation.
    conv2 = activation(conv2)

    # TODO: Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2 = pooling(conv2, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding='VALID')

    # TODO: Flatten. Input = 5x5x16. Output = 400.
    flat = flatten(conv2)
    
    # TODO: Layer 3: Fully Connected. Input = 400. Output = 350.
    fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 350), mean=mu, stddev=sigma))
    fc1_B = tf.Variable(tf.zeros(350))
    fc1 = tf.matmul(flat, fc1_W) + fc1_B
    
    # TODO: Activation.
    fc1 = activation(fc1)

    # TODO: Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W = tf.Variable(tf.truncated_normal(shape=(350, 120), mean=mu, stddev=sigma))
    fc2_B = tf.Variable(tf.zeros(120))
    fc2 = tf.matmul(fc1, fc2_W) + fc2_B
    
    # TODO: Activation.
    fc2 = tf.nn.relu(fc2)
    #fc2 = fc1

    # TODO: Layer 5: Fully Connected. Input = 84. Output = 10.
    fc3_W = tf.Variable(tf.truncated_normal(shape=(120, n_classes), mean=mu, stddev=sigma))
    fc3_B = tf.Variable(tf.zeros(n_classes))
    logits = tf.matmul(fc2, fc3_W) + fc3_B
    
    return logits

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

The ConvNet architecture it's the same as the LeNet architecture with small changes:

  • Convolution 2D:

    • Stride: 1
    • Padding: valid
    • Filter size: 150
    • Activation: tanh
    • Bias size: 6
  • Pooling: avg

    • Stride: 2
  • Convolution 2D:

    • Stride: 1
    • Padding: valid
    • Filter size: 2400
    • Activation: tanh
    • Bias size: 16
  • Pooling: avg

    • Stride: 2
  • Flatten:

    • Output size: 400
  • Fully connected:

    • Size: 140000
    • Bias size: 350
    • Activation: Tanh
  • Fully connected:

    • Size: 42000
    • Bias size: 120
    • Activation: Tanh
  • Fully connected (output layer):

    • Size: 5160
    • Bias size: 43
    • Activation: Tanh

The changes made on the Lenet architecture:

  • Pooling: (max to avg)
  • Activation: (relu to tanh)
In [30]:
### Train your model here.
### Feel free to use as many code cells as needed.
EPOCHS = 30
BATCH_SIZE = 250
rate = 0.0001
try_test = True

## placeholders for data
x = tf.placeholder(tf.float32, (None, 32, 32, 1))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, n_classes)

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    num_examples = len(X_train)
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, num_examples, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})
            
        validation_accuracy = evaluate(X_validation, y_validation)
        print("EPOCH {} ...".format(i+1))
        print("Validation Accuracy = {:.3f}".format(validation_accuracy))
        print()
    
    if try_test:
        test_accuracy = evaluate(X_test, y_test)
        #print("EPOCH {} ...".format(i+1))
        print("Test Accuracy = {:.3f}".format(test_accuracy))
        print()
    saver.save(sess, './traffic')
    print("Model saved")
Training...

EPOCH 1 ...
Validation Accuracy = 0.842

EPOCH 2 ...
Validation Accuracy = 0.914

EPOCH 3 ...
Validation Accuracy = 0.943

EPOCH 4 ...
Validation Accuracy = 0.959

EPOCH 5 ...
Validation Accuracy = 0.966

EPOCH 6 ...
Validation Accuracy = 0.973

EPOCH 7 ...
Validation Accuracy = 0.978

EPOCH 8 ...
Validation Accuracy = 0.981

EPOCH 9 ...
Validation Accuracy = 0.984

EPOCH 10 ...
Validation Accuracy = 0.986

EPOCH 11 ...
Validation Accuracy = 0.988

EPOCH 12 ...
Validation Accuracy = 0.991

EPOCH 13 ...
Validation Accuracy = 0.992

EPOCH 14 ...
Validation Accuracy = 0.992

EPOCH 15 ...
Validation Accuracy = 0.993

EPOCH 16 ...
Validation Accuracy = 0.994

EPOCH 17 ...
Validation Accuracy = 0.995

EPOCH 18 ...
Validation Accuracy = 0.995

EPOCH 19 ...
Validation Accuracy = 0.995

EPOCH 20 ...
Validation Accuracy = 0.996

EPOCH 21 ...
Validation Accuracy = 0.996

EPOCH 22 ...
Validation Accuracy = 0.996

EPOCH 23 ...
Validation Accuracy = 0.996

EPOCH 24 ...
Validation Accuracy = 0.997

EPOCH 25 ...
Validation Accuracy = 0.997

EPOCH 26 ...
Validation Accuracy = 0.997

EPOCH 27 ...
Validation Accuracy = 0.997

EPOCH 28 ...
Validation Accuracy = 0.997

EPOCH 29 ...
Validation Accuracy = 0.997

EPOCH 30 ...
Validation Accuracy = 0.997

Test Accuracy = 0.937

Model saved

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

I did the model training with the following parameters:

  • Optimizer: Adam
  • Batch size: 250
  • Epochs: 30
  • Learning rate: 0.0001

Question 5

What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.

Answer:

I've applied the trial an error process using the following parameters:

  • Batch size: (50, 100, 150, 200, 250)
  • Epochs: (10, 20, 30, 40, 50)
  • Learning Rate: (1, 0.1, 0.001, 0.0001)
  • Optimizer: (RMSProp, AdaDelta, Adam)

At the begining I change the batch size using a leaning rate of 0.001 and 10 epochs to see the validation accuracy, the best one was between 50 and 250 so I choose 250. Then I choose the number of epochs in order to see that the last evaluations of the validation accuracy doesn't change too much, so I choose 30. At the end I started to change the Learning rate, I did this because the suggested learning rate for LeNet is 0.001, so I used this for establish the other parameters, the best one was 0.0001. Later, I started to change the optimizer but I didn't see other better than Adam.

On the architecture implementation I used LeNet because it's well known on the field of image recognition, It's always better to use a known architecture because build a new one can take a lot of effort to get a nice solution.

I made some small changes on the network because I saw in the paper that they used tanh to achieve their problem, and I tried also the average pooling getting better results than the max pooling.


Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [19]:
### Load the images and plot them here.
### Feel free to use as many code cells as needed.
import os

predictions = tf.argmax(logits, 1)
rows_cols = (3, 3)
figure_size = (15, 15)

test_images = [image for image in os.listdir("./traffic-signs-data") if image.startswith("scaled-")]
original_images = [image for image in os.listdir("./traffic-signs-data") if image.startswith("sign")]

fig = plt.figure(1, figure_size)
grid = ImageGrid(fig, 111,
                 nrows_ncols=rows_cols,
                 axes_pad=0.4,
                 )

images = []

print("Scaled images with predictions")
with tf.Session() as sess:
    saver.restore(sess, "./traffic")
    for i, image in enumerate(test_images):
        img = cv2.imread("./traffic-signs-data/" + image, cv2.IMREAD_COLOR)
        grid[i].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        images.append(np.resize(f(img), (32, 32, 1)))
        
    images = np.array(images)
    pred = sess.run(predictions, feed_dict={x: images})
    
    for i, p in enumerate(pred):
        grid[i].set_title(signs[p][1], fontdict=None, loc='center', color = "k")
plt.show(block=True)
print("original images")
fig = plt.figure(1, figure_size)
grid = ImageGrid(fig, 111,
                 nrows_ncols=rows_cols,
                 axes_pad=0.4,
                 )

for i, image in enumerate(original_images):
    img = cv2.imread("./traffic-signs-data/" + image, cv2.IMREAD_COLOR)
    grid[i].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), cmap=plt.get_cmap('Greys_r'))
Scaled images with predictions
original images

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.

Answer:

The problem with these images is that they are rescaled, so it looses a lot of information about the images and that makes hard the classification task.

Question 7

Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.

NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.

Answer:

The accuracy with the test images is 0.11

In [21]:
### Visualize the softmax probabilities here.
### Feel free to use as many code cells as needed.

softmax_output = tf.nn.softmax(logits)

def bar_diagram(probs, out, image, k=5):
    ## print top-k
    arr = np.array(probs)
    classes = arr.argsort()[-k:][::-1]
    print("top " + str(k) + ":")
    for i in range(classes.shape[0]):
        print(i + 1, ".", signs[classes[i]][1])
    
    fig = plt.figure(1, (15, 15))
    
    #fig, ax = plt.subplots()
    n_groups = len(probs)
    
    index = np.arange(n_groups)
    bar_width = 0.35

    opacity = 0.4
    error_config = {'ecolor': '0.3'}
    
    plt.subplot(221)

    rects1 = plt.bar(index, probs, bar_width,
                     alpha=opacity,
                     color='b',
                     error_kw=error_config,
                     label='Probabilities per class')

    plt.xlabel('Class')
    plt.ylabel('Probabilities')
    plt.title('Softmax (Predict: ' + str(signs[out][1]) + ")")
    #plt.xticks(index + bar_width / 2, ('A', 'B', 'C', 'D', 'E'))
    plt.legend()
    plt.tight_layout()
    
    plt.subplot(222)
    plt.imshow(image, cmap='gray')
    plt.show(block=True)

with tf.Session() as sess:
    saver.restore(sess, "./traffic")
    pred = sess.run(softmax_output, feed_dict={x: images})
    output = sess.run(predictions, feed_dict={x: images})
    i = 1
    for probs, out, image in zip(pred, output, images):
        #img = cv2.imread("./traffic-signs-data/" + image, cv2.IMREAD_COLOR)
        print("image", i)
        i += 1
        img = np.resize(image, (32, 32))
        bar_diagram(probs, out, img)
    
image 1
top 5:
1 . b'General caution'
2 . b'Roundabout mandatory'
3 . b'Right-of-way at the next intersection'
4 . b'Speed limit (60km/h)'
5 . b'Children crossing'
image 2
top 5:
1 . b'Speed limit (120km/h)'
2 . b'Wild animals crossing'
3 . b'Road narrows on the right'
4 . b'Speed limit (70km/h)'
5 . b'Beware of ice/snow'
image 3
top 5:
1 . b'Keep right'
2 . b'Turn left ahead'
3 . b'Go straight or right'
4 . b'No entry'
5 . b'Speed limit (60km/h)'
image 4
top 5:
1 . b'Children crossing'
2 . b'Right-of-way at the next intersection'
3 . b'Speed limit (30km/h)'
4 . b'Turn left ahead'
5 . b'No entry'
image 5
top 5:
1 . b'No passing'
2 . b'Priority road'
3 . b'No passing for vehicles over 3.5 metric tons'
4 . b'Speed limit (100km/h)'
5 . b'Speed limit (80km/h)'
image 6
top 5:
1 . b'Yield'
2 . b'No entry'
3 . b'Ahead only'
4 . b'Turn left ahead'
5 . b'Children crossing'
image 7
top 5:
1 . b'Keep right'
2 . b'Turn left ahead'
3 . b'Yield'
4 . b'Ahead only'
5 . b'No passing for vehicles over 3.5 metric tons'
image 8
top 5:
1 . b'Right-of-way at the next intersection'
2 . b'General caution'
3 . b'Speed limit (120km/h)'
4 . b'Road narrows on the right'
5 . b'Dangerous curve to the left'
image 9
top 5:
1 . b'Roundabout mandatory'
2 . b'Traffic signals'
3 . b'Speed limit (80km/h)'
4 . b'Dangerous curve to the right'
5 . b'Go straight or left'

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.

Take this numpy array as an example:

# (5, 6) array
a = np.array([[ 0.24879643,  0.07032244,  0.12641572,  0.34763842,  0.07893497,
         0.12789202],
       [ 0.28086119,  0.27569815,  0.08594638,  0.0178669 ,  0.18063401,
         0.15899337],
       [ 0.26076848,  0.23664738,  0.08020603,  0.07001922,  0.1134371 ,
         0.23892179],
       [ 0.11943333,  0.29198961,  0.02605103,  0.26234032,  0.1351348 ,
         0.16505091],
       [ 0.09561176,  0.34396535,  0.0643941 ,  0.16240774,  0.24206137,
         0.09155967]])

Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:

TopKV2(values=array([[ 0.34763842,  0.24879643,  0.12789202],
       [ 0.28086119,  0.27569815,  0.18063401],
       [ 0.26076848,  0.23892179,  0.23664738],
       [ 0.29198961,  0.26234032,  0.16505091],
       [ 0.34396535,  0.24206137,  0.16240774]]), indices=array([[3, 0, 5],
       [0, 1, 4],
       [0, 5, 1],
       [1, 3, 5],
       [1, 4, 3]], dtype=int32))

Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.

Answer:

Image 1:

The prediction is wrong but the real prediction is in the top 5 and the model is certain of the prediction

Image 2, 3, 4, 5, 9:

The prediction is wrong, the real prediction isn't in the top 5 and the model is certain of the prediction, the reason of this maybe is the image resize process, the model is very confident on their predictions (probability above of 0.8).

Image 6, 8:

These images present the same problems as the previous ones, but the difference is that the model is not very confident with their predictions (probability below of 0.8)

Image 7:

The image is correctly predicted but is not really confident with the prediction (the probability is below of 0.8)

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In [ ]: